Representation of texts as complex networks: a mesoscopic approach
نویسندگان
چکیده
Texts are complex structures emerging from an intricate system consisting of syntactical constraints and semantical relationships. While the complete modeling of such structures is impractical owing to the high level of complexity inherent to linguistic constructions, under a limited domain, certain tasks can still be performed. Recently, statistical techniques aiming at analysis of texts, referred to as text analytics, have departed from the use of simple word count statistics towards a new paradigm. Text mining now hinges on a more sophisticate set of methods, including the representation of texts as complex networks. In this perspective, networks represent a set of textual elements, typically words; and links are established via adjacency relationships. While current word-adjacency (co-occurrence) methods successfully grasp syntactical and stylistic features of written texts, they are unable to represent important aspects of textual data, such as its topical structure. As a consequence, the mesoscopic structure of texts is often overlooked by current methodologies. In order to grasp mesoscopic characteristics of semantical content in written texts, we devised a network approach which is able to analyze documents in a multi-scale, mesoscopic fashion. In the proposed model, a limited amount of adjacent paragraphs are represented as nodes, which are connected whenever they share a minimum semantical content. To illustrate the capabilities of our model, we present, as a use case, a qualitative analysis of “Alice’s Adventures in Wonderland”, a novel by Lewis Carroll. We show that the mesoscopic structure of documents modeled as networks reveals many semantic traits of texts, a feature that could be explored in a myriad of semantic-based applications.
منابع مشابه
A Fast Approach to the Detection of All-Purpose Hubs in Complex Networks with Chemical Applications
A novel algorithm for the fast detection of hubs in chemical networks is presented. The algorithm identifies a set of nodes in the network as most significant, aimed to be the most effective points of distribution for fast, widespread coverage throughout the system. We show that our hubs have in general greater closeness centrality and betweenness centrality than vertices with maximal degree, w...
متن کاملOn the "Calligraphy" of Books
Authorship attribution is a natural language processing task that has been widely studied, often by considering small order statistics. In this paper, we explore a complex network approach to assign the authorship of texts based on their mesoscopic representation, in an attempt to capture the flow of the narrative. Indeed, as reported in this work, such an approach allowed the identification of...
متن کاملExplaining the Methods of Architecture Representation Using Semiotic Analysis (Umberto Eco's Theory of Architecture Codes)
: In this paper, it is tried to explain the concept of representation and architectural representation through a qualitative methodology, approach its procedure for gradual creation in architecture and then according to scholars and to obtain the effect of this concept in the process of architectural facts the concepts are presented. In addition, it is referred to theories and practical texts b...
متن کاملComplex networks: Statics and Dynamics
We present some of the results obtained during the last 8 years about complex networks. Starting with the collection of data in the form of networks or graphs, we proceed on the characterization at different scales: microscopic, macroscopic, and mesoscopic. We introduce also the basic models incorporating complexity in the pattern of connectivities. Finally we review some results on dynamical f...
متن کاملSituation and Text: Representation of Migrants Whilst the Escalation of Refugee Crisis in Great Britain as Compared to Russia
Increasing migration is a vital concern for a globalizing sociocultural environment in today’s world. The UK and developed European countries have become an attractive destination for asylum seekers (labelled as “migrants”) in the past decade. The rapid rise in the number of asylum seekers, which was labelled “migration crisis” (Ruz, 2015), made this topic an integral part of scientific discuss...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Complex Networks
دوره 6 شماره
صفحات -
تاریخ انتشار 2018